Quantum Factoring Performance Evaluation - Using Parallel Programming - VB2019

The purpose of this document is to evaluate system performance in order to factorize large numbers in two prime numbers using Parallel Computing
The CPU evaluated is an a Intel Core i5 1035G1 @1.00 Ghz (Quad Core)
The programming languages used is Visual Basic 2019. Parallel Processing is implemented using the BackGround Worker.
The Intel Processor used consists of four processors. Each processor is capable to handle 2 parallel threads. That means the Intel Processor consists of eight threads or logical processors. The name of the program is: "VB2019 FindPrim QC"
To perform the tests discussed in this chapter it is important that the PC does not go into sleepcondition.
Windows 10 - Select Configuration. Select System. Select Energy and sleep condition. Select Sleep condition net: Never.

The previous highest number (test 6) factorized using i5 M 460 was: 123018668531826302376221323
The two prime numbers are: 33478071698993 and 36746043666811 or roughly 10^27
Performance = 2153 seconds = 0 hours 35 min and 53 seconds

To perform that same test on a i5 1035G1 is 660 seconds = 11 min and 20 seconds

The current highest number (test 7) factorized using i5 1035G1 and 4 Processors is: 123018668453172635462872529011 or roughly 10^29
The two prime numbers are: 334780716989911 and 367460436668101
Performance = 11823 seconds = 1023 = 3 hours 17 min and 3 seconds

Picture 1
VB2010 Test 6
Form Display
Picture 2a VB2019 Test 6
Picture 2b VB2019 Test 7 - Selected

When you compare the two prime numbers the difference is small. This has an advantage. See Reflection part 2

Performance evalution

The following table shows the perfomance evalution to factorize large numbers of different sizes using one, two three or four processors.
Intel Core 1035G1 @1.00 Ghz
Test 0 1 2 3 4 5 6
pm1 33478079 334780717 3347807173 33478071761 334780716989 3347807169919 33478071698993
pm2 36746063 367460449 3674604371 36746043727 367460436671 3674604366743 36746043666811
amin 38040 380405 3804046 38040455 380404547 3804045461 38040454592
a size 10000 100000 1000000 10000000 100000000 100000000010000000000
pp = 1 0,031 0,051 0,257 2,129 20,112 198 1511
pp = 2 0,024 0,042 0,161 1,235 11,708 122 1155
pp = 3 0,024 0,046 0,167 1,225 11,487 123 1029
pp = 4 0,025 0,034 0,125 0,698 6,383 71,625 666
Table 1: VB2019 FindPrim QC
Intel Core 1035G1 @1.00 Ghz
Test period stop sign
0 615093764914418 562856816646609
1 61509335941560384 89146125164832112
2 6150933432074270820 1787540236663610903
3 615093344377562888880 1101918085787451187506
4 61509334226553083574980 19498278967689246691067
5 6150933422795429791033578 4261854706813765918062172
6 615093342265878039130427760 1202674798664227559604492644
7 61509334226585966610859435500 52424584213408915799818872402
Table 1a: VB2019 FindPrim QC - stop sign

In order to understand the program "VB2019 FindPrim QC" the user should study : Answer question 11 - Calculating Periodicity in 5 easy steps Specific study Example 4 and the concepts amax and amin. The difference between those two parameters is the period.
The Visual Basic "VB2019 FindPrim QC" program follows the same 5 steps.

  1. In test 2 the number to factorize n = 12301866871170953183 which consists of two prime numbers: pm1 = 3347807173 and pm2 = 3674604371. See Picture 2 below
    Calculate (n+1)/2. This is 6150933435585476592
    Calculate sqr(n). This is the number 3507401726
    Subtract those two numbers. You get : 6150933432078074866. This is amax. This number is larger than the periodicity.
  2. Now calulate 2 ^ amax mod n or 2 ^ 6150933432078074866 mod 12301866871170953183. You get : 1787540236663610903. This is the stop sign.
  3. Now calculate 2 ^ amin mod 12301866871170953183 = 1787540236663610903 (stop sign)
    This is a "difficult" calculation. See below for more detail. The value you should get is amin = 3804046
  4. Now subtract amin from amax. The value you get is 615093343274270820. That is the period.
    Divide period by 2 or 615093343274270820 by 2. You get: 3075466716037135410. This is per2
    Now perform 2 ^ per2 mod n or 2 ^ 3075466716037135410 mod 12301866871170953183. You get 3151404462955436851. This is y.
  5. Now perform GCD (y+1,n) and GCD(y-1,n). The results are the two prime numbers searched for.
However the last two steps can also be modified to get a much faster algorithm:
  1. Add "amin" to sqsr(n). you get (pm1 + pm2)/2 = 3507401726 + 3804046 = 3511205772 = s/2
    When you multiply this number by 2 you get pm1 + pm2 = s (sum)
    You already have n. which is equal to n = pm1 * pm2.
  2. In order to calculate pm1 and pm2 to you to solve the following equation: (replace pm2 by pm)
    (s - pm)*pm = n or pm^2 - s*pm + n = 0
    That means: pm = (s +- sqr(s^2 - 4 * n))/2 = s/2 +- sqr( (s/2)^2 - n)
    You get pm = 3511205772 +- sqr(3511205772*3511205772 - 12301866871170953183)
    and next = 3511205772 +- sqr(26699102155162801) = 3511205772 +-163398599
    and finally pm1 = 3347807173 and pm2 = 3674604371.
The beauty of this algorithm is that you always get the correct result without any guessing. What is also important is that GCD algorithm is not used.
Step 3 is the most time consuming calculation. In fact you have to perform a loop of 3804046 iterations
This loop consists of the following calculations.
    a = 1: xn = 2
    DO
       a = a + 1
       xn = xn * 2 
       if xn > 12301866871170953183 then xn = xn - 12301866871170953183
    LOOP until  xn = stop_sign  
    amin = a             
In this particular case amin = 3804046. What is important that this is generally speaking a very simple calculation. There is no division involved, which is very time consuming and the only multiplification is with a factor of 2.
  1. Suppose you want to perform this calculation in parallel simultaneous in 4 different processors.
    That means that each processor should roughly perform 3804046/4 = 1000000 calculations.
    That means processor 1 performs the calculations from 1 to 1000000. processor 2 from 1000001 to 2000000. processor 3 from 2000001 to 3000000. and processor 4 from 3000001 to 4000000. In this particular case Processor 4 will return the final result.
  2. When you study the calculation of amin you will see that each iteration depents on its previous result.
    That means that the start calulation of xn in Processor 2 depents on the final value of xn in Processor 1.
    The same that the start calulation of xn in Processor 3 depents on the final value of xn in Processor 2.
    To solve this time delay the program starts with modular exponentiation. That means first the function 2 ^ a mod n is calculated. For processor 2 this is 2 ^ 1000001 mod n
    The program for processor 2 now becomes:
        end_pp = 0
        a = 1000000: 
        calculate: xn = 2^a mod n 
        DO
           a = a + 1
           xn = xn * 2 
           if xn > 12301866871170953183 then xn = xn - 12301866871170953183
        LOOP until  a = 2000000  or  xn = stop_sign  or  end_pp > 0  
        if xn = stop_sign then
            amin = a
            end_pp = 2
        end if              
    
    The parameter end_pp contains the processor # which detected the "stop sign". The parameter end_pp also services as a command to terminate the other processors.
  3. In the above case the value of 1000000 is selected to test the optimum solution. The value of 1000000 is called the a size.That means in each processor only at the start modular exponentiation is performed.
    In reality this optimum value of 1000000 for the parameter a size is not known.
    In case a size is 500000 the following calculations are performed in parallel:
  4. In order to implement parallel programming the concept of BackGroundworker is used. In total three. That means one Processor services as a Master and the three BackGroundWorkers serve as a slave.
    For more information about the BackGroundworker read this: Visual Basic 2019 Parallel Programming techniques
    The first task of the Master is to Assign or activate each slave. After assignment the slave starts a program which performs an endless loop
    Communication between the Master and each Slave is controlled by a state array. Backgroundworker 1 uses state(2). Backgroudworker 2 uses state(3) etc.


"VB2019 FindPrim QC" Program source.

For the readers to try the program there is an executable available. To load this excutable as a Zip file goto: VB2019 FindPrim QC.zip


"VB2019 FindPrim QC" Program Operation.

The program is build using 3 Forms: Control Form, Display Form and Monitor Form:

The parameter "a size"

Table 1 shows for Test = 3 that the parameter amin = 38040455 and a size = 10000000. The parameter N Upd = 1
That means the Monitor display is updated after each 1000000 iterations
In this particular case P1 performs the iterations from 1 to 10000000, P2 from 10000001 to 20000000, P3 from 2000000 to 30000000 and P4 from 30000001 to 40000000
The value amin is 38040455 indicates the iteration which results in the stop sign. This is the iteration performed in P4.

Table 2 below, shows the results for Test 3 and Test 4 and for different values of the parameter a size. The column update shows the total number of updates performed. The number of processors used is 4.

a size Test = 3 update a size Test = 4 update
10000000 0,697 1 100000000 6,688 1
5000000 0,693 2 50000000 6,382 2
4000000 0,842 3 40000000 7,915 3
2000000 0,699 5 20000000 6,423 5
1000000 0,796 10 10000000 7,108 11
500000 0,786 20 5000000 6,877 20
400000 0,853 25 4000000 6,877 26
Table 2: 4 Processors

What Table 2 shows is that almost all the parameter values are identical except for the bottom line with "a size" = 500000 (Test 3) and "a size" = 5000000 (Test 4)
The reason is that in this particular case the final value is calculated in Processor #1, and this is the Processor which is the most busy.
To solve that Load sharing is used. This means that in the case of 4 Processors in Processor #1 70% is done and in the 3 other processors each 110%

a size Test = 3
10000000 82,248
500000089,917
4000000 107,725
2000000 90,226
1000000 102,845
500000 105,509
Table 3: 4 Processors
Picture 6 - Test = 3
a size = 5000000

Table 3 includes Load Sharing The result is that now all the results are more or less identical.

Overview communication between Master and Slaves.

Communication between Processor #1 and Processor #2 (#3, #4) is controlled by the state arrays state1 and state2. The size is 4.
state 1 is used to send data from the master to the slaves
state 2 is used to send data from the slaves to the master
The state values are:Free = 0, Start = 1, Active = 2, Finished = 3, Cancel = 4 and End = 5

           *******                      *****
           *  1  *                      * 4 *
           *     *                      *   *
state 1 ****     ************************   ***********

                 ************************   ***********
                 * 2                  3 *   * 5
                 *                      *   *
state 2 **********                      *****

period     1     2            3         4   5  
Communication between Master and Slaves is controlled by 5 differents periods or events.
  1. Period 1 is identified that the operator enters the two prime numbers and the # of processors to be used and selects the Start pushbotton. If N Proc is 4 The 3 Slave processors are assigned and the Master issues start request. The state 1 array is set to 1.
  2. Period 2 is controlled by the "Back Ground Workers". The "Back Ground Worker" detects that state 1 is a Start request ( = 1 ) and issues a Active request. state 2 becomes a 2. The subroutines Geta_mod2_PP_x is started. (See also next paragraph).
  3. Period 3 is controlled by the subroutine Geta_mod2_PP_x. During its operation no communication with the master is required.
    Period 3 can terminate in 2 ways.
    1. The master or slave detects the "stop sign". If that is the case the variable pr_end is set to the processor # of the master or slave.
    2. The master or slave detectects that the variable pr_end is not equal to zero.
    in both cases the variable state 2 is set to finished. That means becomes 3.
    subroutine Geta_mod2_PP_x is finished, but the Background worker is still active.
  4. Period 4 is controlled by the Master.
    The first thing that the Master does is to test that all the processors go into the finished state. When that has happened the Master issues the Cancel Request and the array state 1 becomes 4.
  5. Period 5 is controlled by the "Back Ground Workers". The "Back Ground Worker" detects that state 1 is a Cancel request ( = 4 ) and issues a End request. state 2 becomes a 5. The "Back Ground Workers" ends.
This terminates all communication and the program.


Additional Software considerations.

Main_Int is central program, which runs in processor 1, that collects the input parameters, controls the three background workers, and calculates the final prime numbers.
Geta_mod2_PP is the name of the subroutine that calculates the periodicity. Main_Int uses Geta_mod2_PP
Geta_mod2_PP_2 is the name of the program which runs in parallel processor 2 and that also calculates the periodicity (part of).
Geta_mod2_PP_3 and Geta_mod2_PP_4 do the same either in processor 3 or 4.

Each of the programs Geta_mod2_PP has the same structure:
a call to the subroutine Modular_2_exp_Int_array and the execution of a loop.
within the loop the subroutine Test_Int_Simple_array is called.
Modular_2_exp_Int_array uses the following subroutines:

The execution time of the loop is the most time consuming.
The subroutine Test_Int_Simple_array does not use any additional subroutines and can be easily copied inside the loop. The type of mathematical operations used are: addition, subtraction and division by 2.
The integer array numbers used contain 9 digits. That means a number like 123456789123456789 is stored in two elements of an array.
The largest integer number for Intel R5 is 19 bits. It is possible to work with larger integer numbers, but that is a typical programming exercise.

The program is written in Visual Basic 2019. It is possible to use an other language like C++. This is also a typical programming exercise.


Comparision between 3 types of processors: 1) Pentium P4 2) i5 M 460 and 3) i5 1035G1

The program tested in all situations is: VB2010 FindPrim QC. The number of processors tested = 1
Intel Core 1035G1 @1.00 Ghz
Test 0 1 2 3 4
Pentium P4 47 256 1319 11197 112000
i5 1035G1 16 30 301 2673 26334

Reflection part 1

Document Shor's Algorithm explains Shor's Algorithm in great detail. Reflection Part 5. - Cryptography. of that same document, shows Table 5 "CPU Performance P4". One line in Table 5 shows a value 29 in green. This whole line represents the same situation as in Test 2 in Table 1 above in this document. This is the Column with the value 28.943 in green . In both cases only one Processor is used.
What Table 1 demonstrates is that by using parallel processing the performance to do factorization can de drastically improved compared with the 1 Processor situation. This is a factor 4 when 4 Processors are used.
You can also describe this a little bit euphoric:
The concept of parallel processing is the ultimum approach to simulate a PC as a Quantum Computer.

This concept leads to a much more general question:
Will a Quantum Computer using the concepts QBits, Superposition and Entanglement ever be able to outperform a PC?
I do not think it does.
In fact the difference in Performance between the two will only increase.


Reflection part 2

The two prime numbers pm1 and pm2 used to test are almost the same. In this paragraph we investigate what happens for different prime number combinations when the number n is almost the same.
The following table shows the result when prime number 1 becomes smaller and prime number 2 increases. The column dtime shows that processing time increases. Column update shows the number of screen updates, which increases because asize is constant. N proc = 4 and N upd = 1
test pm1 pm2 n acalc/amin periodicity dtime update
21 3450000017 3565758503 12301866895967894551 477530 6150933444476545546 60 1
22 3347807173 3674604371 12301866871170953183 3804046 6150933432074270820 124 1/2
23 3300000001 3727838447 12301866878827838447 6517497 6150933435900000000 186 2
24 3200000087 3844333357 12301867076857002059 14764967 6150933534906334308 317 5
25 3000000019 4100622271 12301866890911823149 42909416 6150933441905600430 1007 13
26 2000000011 6150933427 12301866921660267697 568064986 6150933456754667130 10440 143
27 1500000001 8201244581 12301866879701244581 1343220564 6150933435000000000 27605 375
28 1000000007 12301866785 12301866871113067495 3143531670 6150933428905600352 70363 882
What the above shows is:


Feedback

None


Created: 8 November 2013 VB2010
Updated: 1 December 2015 VB2010
Created: 10 AUgust 2021 VB2019

Back to Shor's Algorithm - 10 Questions
Back to my home page Contents of This Document